Exploiting lower face symmetry in appearance-based automatic speechreading
نویسندگان
چکیده
Appearance-based visual speech feature extraction is being widely used in the automatic speechreading and audio-visual speech recognition literature. In its most common application, the discrete cosine transform (DCT) is utilized to compress the image of the speaker’s mouth region-of-interest (ROI), and the highest energy spatial frequency components are retained as visual features. Good generalization performance of the resulting system however requires robust ROI extraction and its consistent normalization, designed to compensate for speaker headpose and other data variations. In general, one expects that the ROI if correctly normalized will be nearly laterally symmetric, due to the approximate symmetry of human faces. We thus argue that forcing lateral ROI symmetry can be beneficial to automatic speechreading, providing a mechanism to compensate for small face and mouth tracking errors, which would otherwise result to incorrect ROI normalization. In this paper, we propose to achieve such ROI symmetry indirectly, by considering the spatial frequency domain and exploiting the DCT properties. In particular, we propose to remove the odd frequency DCT components from the selected visual feature vector. We experimentally demonstrate that, in general, this approach does not hurt speechreading performance, while it reduces computation, since it results to less DCT features. In addition, for the same number of features, as in traditional DCT coefficient selection, the method results in significant speechreading improvements. For the connected-digit automatic speechreading experiments considered, and for low feature dimensionalities, such can reach up to 12% relative reduction in word error rate.
منابع مشابه
A hierarchy probability-based visual features extraction method for speechreading
1 This research is supported by the President Foundation of the Institute of Acoustics, Chinese Academy of Sciences (No.98-02) and “863” High Tech R&D Project of China (No. 863-306-ZD-11-1). ABSTRACT Visual feature extraction method now becomes the key technique in automatic speechreading systems. However it still remains a difficult problem due to large inter-person and intraperson appearance ...
متن کاملAutomatic Face Recognition via Local Directional Patterns
Automatic facial recognition has many potential applications in different areas of humancomputer interaction. However, they are not yet fully realized due to the lack of an effectivefacial feature descriptor. In this paper, we present a new appearance based feature descriptor,the local directional pattern (LDP), to represent facial geometry and analyze its performance inrecognition. An LDP feat...
متن کاملLip representation by image ellipse
Automatic speechreading systems through their use of visual information to support the acoustic signal have been shown to yield better recognition performance than purely acoustic systems, especially when background noise is present. In this paper an answer is sought to the most important questions of speechreading: Which features can represent visual information well? How can they be extracted...
متن کاملScattering vs. discrete cosine transform features in visual speech processing
Appearance-based feature extraction constitutes the dominant approach for visual speech representation in a variety of problems, such as automatic speechreading, visual speech detection, and others. To obtain the necessary visual features, typically a rectangular region-of-interest (ROI) containing the speaker’s mouth is first extracted, followed, most commonly, by a discrete cosine transform (...
متن کاملA speechreading aid based on phonetic ASR
Manual Cued Speech (MCS) is an effective method of communication by the deaf and hearing-impaired. We first describe our work on assessing the feasibility of automatic determination and presentation of cues without intervention by the speaker. The conclusions of this study are then applied to the design and implementation of a prototype automatic cueing system using HMM-based automatic speech r...
متن کامل